10 research outputs found

    Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

    Full text link
    Markov decision processes (MDPs) are the defacto frame-work for sequential decision making in the presence ofstochastic uncertainty. A classical optimization criterion forMDPs is to maximize the expected discounted-sum pay-off, which ignores low probability catastrophic events withhighly negative impact on the system. On the other hand,risk-averse policies require the probability of undesirableevents to be below a given threshold, but they do not accountfor optimization of the expected payoff. We consider MDPswith discounted-sum payoff with failure states which repre-sent catastrophic outcomes. The objective of risk-constrainedplanning is to maximize the expected discounted-sum payoffamong risk-averse policies that ensure the probability to en-counter a failure state is below a desired threshold. Our maincontribution is an efficient risk-constrained planning algo-rithm that combines UCT-like search with a predictor learnedthrough interaction with the MDP (in the style of AlphaZero)and with a risk-constrained action selection via linear pro-gramming. We demonstrate the effectiveness of our approachwith experiments on classical MDPs from the literature, in-cluding benchmarks with an order of 10^6 states.Comment: Published on AAAI 202

    Verification of Open Interactive Markov Chains

    Get PDF
    Interactive Markov chains (IMC) are compositional behavioral models extending both labeled transition systems and continuous-time Markov chains. IMC pair modeling convenience - owed to compositionality properties - with effective verification algorithms and tools - owed to Markov properties. Thus far however, IMC verification did not consider compositionality properties, but considered closed systems. This paper discusses the evaluation of IMC in an open and thus compositional interpretation. For this we embed the IMC into a game that is played with the environment. We devise algorithms that enable us to derive bounds on reachability probabilities that are assured to hold in any composition context

    Analysis and Prediction of the Long-Run Behavior of Probabilistic Sequential Programs with Recursion (Extended Abstract)

    No full text
    We introduce a family of long-run average properties of Markov chains that are useful for purposes of performance and reliability analysis, and show that these properties can effectively be checked for a subclass of infinite-state Markov chains generated by probabilistic programs with recursive procedures. We also show how to predict these properties by analyzing finite prefixes of runs, and present an efficient prediction algorithm for the mentioned subclass of Markov chains

    On the Memory Consumption of Probabilistic Pushdown Automata

    Get PDF
    We investigate the problem of evaluating memory consumption for systems modelled by probabilistic pushdown automata (pPDA). The space needed by a runof a pPDA is the maximal height reached by the stack during the run. Theproblem is motivated by the investigation of depth-first computations that playan important role for space-efficient schedulings of multithreaded programs. We study the computation of both the distribution of the memory consumption and its expectation. For the distribution, we show that a naive method incurs anexponential blow-up, and that it can be avoided using linear equation systems.We also suggest a possibly even faster approximation method.Given~varepsilon>0varepsilon>0, these methods allow to compute bounds on the memoryconsumption that are exceeded with a probability of at most~varepsilonvarepsilon. For the expected memory consumption, we show that whether it is infinite can be decided in polynomial time for stateless pPDA (pBPA) and in polynomial space for pPDA. We also provide an iterative method for approximating theexpectation. We show how to compute error bounds of our approximation methodand analyze its convergence speed. We prove that our method convergeslinearly, i.e., the number of accurate bits of the approximation is a linear function of the number of iterations
    corecore